UniNE at CLEF 2009: Persian Ad Hoc Retrieval and IP

نویسندگان

  • Ljiljana Dolamic
  • Claire Fautsch
  • Jacques Savoy
چکیده

This paper describes the participation of the University of Neuchâtel to the CLEF 2008 evaluation campaign. In the Persian ad hoc task, we suggest using a light suffixstripping algorithm for the Farsi language and the evaluations demonstrated that such an approach performs better than a simple light stemmer, an approach ignoring the stemming stage or a language independent approach (n-gram). The use of a blind query expansion (e.g., Rocchio’s model) may improve the retrieval effectiveness. Combining different indexing and search strategies may further enhance the corresponding MAP. In the Intellectual Property (IP) task, we try different strategies to select and weight pertinent words to be extracted from a patent description in order to form an effective query. We also evaluated different search models and found that probabilistic models tend to perform better than vector-space schemes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

German, French, English and Persian Retrieval Experiments at CLEF 2009

We describe evaluation experiments conducted by submitting retrieval runs for the monolingual German, French, English and Persian (Farsi) information retrieval tasks of the Ad Hoc Track of the Cross-Language Evaluation Forum (CLEF) 2009. In the ad hoc retrieval tasks, the system was given 50 natural language queries, and the goal was to find all of the relevant records or documents (with high p...

متن کامل

JHU Experiments in Monolingual Farsi Document Retrieval at CLEF 2009

At CLEF 2009 JHU submitted runs in the ad hoc track for the monolingual Persian evaluation. Variants of character n-gram tokenization provided a 10% relative gain over unnormalized words. A run based on skip n-grams, which allow internal skipped letters, achieved a mean average precision of 0.4938. Using traditional 5-grams resulted in a score of 0.4868 while plain words had a score of 0.4463.

متن کامل

Ad Hoc Retrieval with the Persian Language

This paper describes our participation to the Persian ad hoc search during the CLEF 2009 evaluation campaign. In this task, we suggest using a light suffix-stripping algorithm for the Farsi (or Persian) language. The evaluations based on different probabilistic models demonstrated that our stemming approach performs better than a stemmer removing only the plural suffixes, or statistically bette...

متن کامل

Ad Hoc Information Retrieval for Persian

In this paper we present an introduction to the Persian language and its morphology, and describe available resources for Persian text processing. We then propose and evaluate an information retrieval model, a variation of the vector space model which uses the relations existing between query terms. Our experiments on the Hamshahri collection show that the proposed model has better precision fo...

متن کامل

CLEF 2009 Ad Hoc Track Overview: TEL and Persian Tasks

The 2009 Ad Hoc track was to a large extent a repetition of last year’s track, with the same three tasks: Tel@CLEF, Persian@CLEF, and Robust-WSD. In this first of the two track overviews, we describe the objectives and results of the TEL and Persian tasks and provide some statistical analyses.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009